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ABSTRACT 

One of the most prominent technical challenges to 
effective deployment of health management systems is 
the vast difference in user objectives with respect to 
engineering development. In this paper, a detailed 
survey on the objectives of different users of health 
management systems is presented. These user 
objectives are then mapped to the metrics typically 
encountered in the development and testing of two 
main systems health management functions: diagnosis 
and prognosis. Using this mapping, the gaps between 
user goals and the metrics associated with diagnostics 
and prognostics are identified and presented with a 
collection of lessons learned from previous studies that 
include both industrial and military aerospace 
applications. 


1. INTRODUCTION 

A detailed survey on the objectives of users of health 
management systems is presented. These user 
objectives and associated metrics are identified across 
operational, regulatory and engineering domains for 
diagnosis and prognosis algorithms and systems. 

This survey was sponsored by NASA’s Aviation Safety 
Program, Integrated Vehicle Health Management 
(IVHM) Project to aid in identifying critical gaps 
within their existing research portfolio that are not 
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currently being addressed by the broader research 
community. 

The origins of Integrated Vehicle Health Management 
are complex and entwined in aerospace developments 
by the National Aeronautics and Space Administration 
(NASA) and the Department of Defense (DoD). 
Johnson [4] has identified various roots of the 
expression integrated vehicle health management 
dating back to the 1970s for system monitoring and on- 
board fault protection, the 1960s for failure analysis 
and modeling and even as far back as the 1950s for 
reliability analysis. The coining of the term “integrated 
diagnostics” can be traced to the Air Force in 1980 [5]. 
Nearly three decades have since passed and the 
adoption within commercial aviation of comprehensive 
and integrated diagnostic and prognostic systems is still 
slowly progressing. There are examples of sub-systems 
that have been matured and successfully deployed [6, 
7], but examples of comprehensive and integrated 
systems are still sparse. The next section discusses 
some examples in more detail. 

Johnson [4] defined Integrated System Health 
Engineering and Management (ISHEM) as “the 
processes, techniques, and technologies used to design, 
analyze, build, verify, and operate a system to prevent 
faults and/or mitigate their effects.” A definition that 
contains a stronger notion of the predictive qualities of 
prognostics is provided in [8]: “Integrated Vehicle 

Health Management is the unified capability of an 
arbitrarily complex system of systems to accurately 
assess the current state of member system health, 
predict some future state of the health of member 
systems, and assess that state of health within the 
appropriate framework of available resources and 
operational demand.” Wilmering [8] goes on to identify 
the major capabilities of an IVHM system: 
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• “Built-in-Test (BIT) - an embedded diagnostic 
capability 

• Diagnostics - the process of determining the state or 
capability of a component to perform its function(s) 

• Prognostics - predictive diagnostics; determining the 
remaining life or time span of the proper operation of a 
component 

• Health Monitoring - the process of monitoring the 
state or condition of a component. 

• Health Management - the capability to make 
appropriate decisions about maintenance actions based 
on diagnostics/prognostics information, available 
resources and operational demand.” 

We broaden this decision making associated with 
Health Management to include mitigation, recovery and 
reconfiguration actions. We will interchangeably use 
the terms 1VHM and ISHM (Integrated System Health 
Management) as meaning an integrated system 
consisting of an information system (possibly 
distributed) for collecting, analyzing and accessing data 
obtained from on-board aircraft sensors, diagnostic and 
prognostic systems. The term diagnostic is used to 
define a system or sub-system capable of detecting and 
identifying faults and their locations. "The term 
prognostics in the context of the Prognostic Health 
Management (PHM) field specifically refers to the 
ability to detect, isolate and diagnose mechanical and 
electrical faults in asset components as well as 
determine the accurate remaining usefid life (RUL) of 
those components.” [9]. The health management 
systems under discussion include capabilities that 
encompass both diagnosis and prognosis such as 
autonomic logistics. “Autonomic Logistics (AL) is 
simply the application of automation to locating and 
ordering repair parts so that they are available when 
needed.” [10], 

One of the possible reasons for slow adoption of 
integrated health management systems is the vast 
difference in user objectives with respect to engineering 
development. In this paper, we present a survey of the 
objectives of different users of integrated health 
management systems, how they each would measure 
success of such systems (metrics), and how these 
objectives and metrics relate to engineering R&D 
efforts developing prognostic and diagnostic algorithms 
and systems. 

Section II gives background on the application of 
health management in the aviation domain; Section III 
discusses the motivation and competing challenges for 
HM; Section IV presents objectives and metrics for 
different HM users; Section V describes metrics 
associated with development and operation of 
diagnostic and prognostic systems; finally, sections VI 
and Vll provide discussion and summary, respectively, 
of the topics in this paper. 


2. BACKGROUND 

The first generation health management system (as 
exemplified in B727, DC-9/MD-80, B737 classic) 
consisted of “push-to-test” functionality of mechanical 
and analog systems in which a button was pressed to 
test internal circuitry and simple status lights would 
illuminate the go/no-go results for the device under test. 
The second generation (B757/767, B737NG, MD-90, 
A320) saw the use of black-box digital systems to carry 
out the health management functions previously 
performed by mechanical and analog systems. The 
third generation (MD-11, B747-400) saw the 

introduction of systems implementing the ARINC 
Standard 604, “Guidance for Design and Use of Built- 
In Test Equipment.” Early third generation systems 
allowed centralized access to the federated avionics 
BIT results but required manual consolidation of Line 
Replaceable Unit (LRU) fault indications. Later third 
generation systems used Central Maintenance 
Computers (CMCs) to aggregate all fault indications 
and perform root cause analysis via complex logic- 
based equations. The ability to downlink fault results to 
ground stations while en route was also added. Lessons 
learned were incorporated into updated standards, 
including ARINC 624, “Design Guidance for Onboard 
Maintenance System.” The fourth generation 
implements improved health management functionality 
through the use of modular avionics. In contrast to 
having specific avionics functions associated with a 
LRU, multiple avionics functions are associated with 
Line Replaceable Modules (LRMs). The health 
management system employed on the Boeing 777 
represents the fourth generation in the evolution of 
vehicle health management [11], The Boeing 777 
Airplane Information Management System (AIMS) 
integrates two key diagnostic subsystems: the Central 
Maintenance Computing Function (CMCF), which 
diagnoses faults after they happen, and the Airplane 
Condition Monitoring Function (ACMF), which 
collects data to allow prediction of future problems and 
thus enables condition-based maintenance. In contrast 
to the logic equation-based diagnostics in previous 
health management systems, the CMCF in the Boeing 
777 employs model-based reasoning in an attempt to 
overcome difficulties in developing and maintaining 
the health management functions. Subsequent 
developments have extended the scalability and 
extensibility of the modular avionics systems and the 
associated health management functionality. Despite 
the advances over the years, there are still difficulties in 
developing and implementing health management 
systems that meet user requirements [12], although 
these difficulties may be more programmatic than 
technical. 

MacConnell [13] conducted an extensive working 
group study on the benefits of ISHM consisting of 
representatives from the Air Force Research Laboratory 
(AFRL), Boeing, General Electric, Honeywell, 
Lockheed Martin, Northrop Grumman, United 
Technologies, Pratt & Whitney and others. New 
benefits were identified that may be perceived as more 
indirect. For example, automated monitoring could be 
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relied on to dramatically reduce factors of safety for 
design and to enable revolutionary certification 
processes. The working group [13] ranked the relative 
importance of the functional areas in ISHM benefits. 
The top five were diagnostics, analysis, design, 
structure and propulsion. This is a mix of health 
management functions (diagnostics, analysis) with 
application areas (structure, propulsion). This 
highlights that the words used (ontology) amongst even 
a group of experts can cause opacity in health 
management discussions thus making it difficult to 
clearly outline the requirements driving the 
development and integration of fleet wide health 
management systems. 

Ofstun [14] has a succinct overview of developing 
IVHM for aerospace platforms, pointing out that 
traditional built-in-tests generally have not provided the 
accuracy or reliability needed to impact operational 
efficiency and maintenance. A goal of IVHM should 
be to both improve and extend traditional BIT 
approaches in subsystems such as avionics, electrical 
(including wiring), actuators, environmental control, 
propulsion, hydraulics, structures as well as overall 
system performance. 

Ofsthun [14] highlights IVHM lessons learned that are 
points similar to those that will be seen in this article 
relating user community goals to diagnostic and 
prognostic modeling metrics. Our article highlights: 

• IVHM performance measures need to be derived by 
an integrated product development team that accounts 
for all expected user groups. 

• Cost/benefit analyses need to be conducted for each 
expected user group during requirements definition. 

• A common health management infrastructure is 
needed to integrate across subsystems - including 
definition of subsystem responsibilities. 

• Trade-space analyses need to be conducted between 
failure detection and false alarm rates - including crew 
enabled filtering. 

• Verification and validation of IVHM system needs to 
include incremental validation by demonstrations as 
well as opportunistic monitoring. 

Currently the best developing example of a highly 
integrated system for health management is the Joint 
Strike Fighter program (JSF) which mandates such a 
development [15]. One of the greatest challenges in 
developing a health management system from the 
ground up has been in refining the user objectives and 
requirements to an adequate level that includes buy-in 
from the expected and varied user groups. 

3. MOTIVATION 

Wide-spread adoption of integrated health management 
has been slow due to competing factors that have to be 


satisfied within the HM user community. Two areas 
stand out in this regard: Aging and Expected Life and 
Cost vs. Benefit. 

3.1 Aging and Expected Life 

As the average age of air fleets begins to be higher than 
the original expected useful life, in order to preserve 
safety-of-flight (SOF), it becomes necessary to increase 
the periodicity and depth of inspections. This results in 
an increase in maintenance costs as well as longer 
periods of downtime. One of the benefits of an ISHM 
system that includes structural health monitoring is that 
this inspection burden can be reduced by relying upon 
continuous monitoring [16]. The USAF has deployed 
structural monitoring systems that allow for the 
required maintenance inspection interval to be tailored 
to each aircraft, which has resulted in reducing the 
inspection burden, costs and amount of downtime. 

One might be tempted to suggest that if the average age 
of an air fleet (either military or commercial) is starting 
to exceed the expected life, then a possible strategy to 
reduce the average age would be to begin replacement 
of the oldest with new aircraft. Unfortunately, 
especially in the case of the U.S. DoD, with given 
budgets it would not be possible to decrease the 
average age enough to make a difference. This is also 
true in civilian fleets: “The statistics show that the 

number of aging aircraft (older than 15 years) has 
increased continuously. This number was around 4600 
in 1997 for US and European built civil aircraft flown 
with more than 1900 aircraft older than 25 years. This 
number increased to 4730 (>15 years) and 2130 (>25 
years) respectively in 1999” [17]. 

From an engineering perspective, the development of 
health management systems design to mitigate the 
greatest risks is dependent upon accurate data 
collection. The data needed for maturation analysis is 
usually difficult both to obtain (due to heterogeneous 
systems) as well as to collect: this makes access, 

retrieval, and integration of the requisite information a 
costly and often incomplete process at best” [8]. 

3.2 Cost vs. Benefit 

Installation of integrated health management systems 
incur development, installation and life cycle costs. 
Some of the costs associated with a health management 
solution include maintenance of the health management 
system components (such as sensor replacement and 
software upgrades) as well as increases in system 
volume and mass requirements. These costs need to be 
countered with expected savings gains over the life of 
the aircraft through a rigorous cost benefit analysis 
(CBA). The slow acceptance of health management 
tools has been attributed to the incomplete total life 
cycle systems engineering management [18] which 
introduces an approach for proper system analysis 
methods. Often the optimization of objectives consists 
of conflicting goals such as minimizing purchase cost 
and maximizing availability [19]. Calculating costs 
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such as operating costs consists of complex parameters 
such as average downtime for unplanned repairs. 

In spite of these challenges, different methods have 
been developed to analyze cost-benefit tradeoffs for 
designing and implementing IVHM systems. For 
example, [20] discusses the benefits of IVHM to 5 
different categories of operators: the Original 

Equipment Manufacturers (OEMs), the mission 
operators, command/control elements, fleet 
management, and maintenance operators. These five 
categories may overlap in organizational structure and 
personnel, but they have clearly identifiable processes 
and performance that can be measured. Another 
example of an approach to conducting a CBA for 
IVHM appears in [21]. Their methodology utilizes pre- 
existing reliability and logistics source information 
from a Failure Modes & Effects Criticality Analysis 
(FMECA), line maintenance activities and legacy field 
event rates. IVHM will have the greatest benefit when 
it is applied to those areas that are historically the least 
reliable, have failure modes that can greatly impact 
mission success, have sub-systems that are the most 
difficult to diagnose or for which replacements parts 
cannot be obtained in a timely-manner [22] . 

The impacts of diagnostic capability on unscheduled 
maintenance include [21]: 

• reduction of cannot duplicate rates 

• reduction of labor mean-time-to-detect (MTTD) 

• reduction of line replaceable unit (LRU) repair costs 

• reduction of repair times (increase availability) 

The benefits impacting scheduled maintenance include: 

• reduction of labor 

• reduction of maintenance induced failures 

• elimination of scheduled maintenance 
Prognostic capabilities impacting operations include: 

• reduction in number of engine in-flight shutdowns, 
mission aborts, lost sorties 

• reduction of secondary damage 

• ability to reconfigure and re-plan for optimal usage of 
the remaining useful life (RUL) of failing components 

• maximized usage of the component life while 
ensuring mission safety 

One example of cost-benefit quantification of ISHM in 
aerospace systems appears in [23]. Their methodology 
analyzes the trade-offs between system availability, 


cost of detection, and cost of risk. In this optimization 
formulation, cost of detection includes the cost of 
periodic inspection/maintenance and the cost of ISHM; 
cost of risk quantifies risk in financial terms as a 
function of the consequential cost of a fault and the 
probabilities of occurrence and detection. Increasing 
the ISHM footprint will generally lower cost of risk 
while raising cost of detection, while availability will 
increase or decrease based upon the balance of the 
reliability and detectability of the sensors added, versus 
their ability to reduce total maintenance time. 

The business case for ISHM generated by an ISHM 
working group composed mostly of industry [13] 
resulted in the following rankings of benefits: 

1 . Maintenance time savings. 

2. False alarm avoidance - reduce can not duplicate 
(CND) and retest okay (RTOK). 

3. Availability Improvement - increase MTBMA - 
mean time between maintenance actions. 

4. Spares and supply savings. 

5. Recurring cost savings. 

In the past there have been many anecdotal accounts of 
the benefits of ISHM. Now some systems, such as the 
condition based maintenance helicopter programs 
(STAMIS, HUMS) are starting to produce real results. 
For example in [24] implementing health management 
in the UH-60 has resulted in an increase in fully 
mission capable status from 65% to 87% resulting in an 
increase in total flight hours from 10,331 to 21,819. 

There are uncertainties inherent to new PHM systems 
such as the fact that not all faults will be diagnosed 
correctly (PHM Effectiveness). Two factors that may 
detract from the benefits of prognostics [25]: 

• Prognostics may cause some sub-systems to be 
replaced much earlier than their eventual failure thus 
reducing their useful life. This will require engineering 
resources to analyze replaced units in order to optimize 
replacement thresholds. 

• False prognostic replacement indicators may cause 
replaceable units to be replaced that are not in any 
danger of failing. This will require further engineering 
resources to mitigate these false alarms. 

The perceived and real difficulties of retrofitting legacy 
aviation systems with effective health management and 
the challenges of unambiguously quantifying the 
benefit in new systems has hampered more wide-spread 
adoption of integrated health management. However, 
more and more, these technical and programmatic 
issues are being addressed within the health 
management community. 
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4. USER OBJECTIVES AND METRICS 

In order to present the ISHM user objectives and 
metrics we have chosen to broadly categorize types of 
users. There are many different ways to categorize 
these health management stakeholders. Our approach 
is shown in Figure 1. 

Our three top-level stakeholder categories are 
Operations, Regulatory and Engineering. In this article 
we will focus on looking at the user objectives derived 
from operations and how these impact the modeling 
efforts of the engineering R&D activities. 

The three distinct user groups consist of operations, 
regulatory and engineering. Within operations we have 
logistics, flight, maintenance, fleet management and 
training. Regulatory users are concerned mainly with 
establishing FAA amendments and new rules taking 
advantage of health management information. Within 
engineering we have sustaining, R&D and 
manufacturing. Although design engineers can be 
considered users of health management, due to space 
considerations we do not survey engineering design. 

In the remainder of this article we have chosen to 
highlight each identified user objective only once even 
if it may be attributable to multiple users. For example, 
reducing labor is an objective that spans multiple users 
but the associated user metric is universal - hours of 
labor. Our categorization also has forced boundaries 
between user groups that may cause some of the 
objectives to be split. For example, one of the user 
objectives for logistics is to reduce the mean time to 
repair. We have chosen to put this under logistics 
rather than under maintenance as in [26]. 



Logistics p i 
Flight 

Maintenance 

Fleet 

Management 
Training fcfl 



Sustaining j 

1 


R&D | 

Manufacturing j 


Fig 1. Categorization of groups driving health 
management objectives. 


Table 1. Logistics HM Goals and Metrics 


Logistics Goals 

User Metrics 

Map 

L. 1 Reduce repair 
turn-around time 

Mean time to 
repair (MTTR), 
time delays 
waiting for parts 

d, p 

L.2 Reduce ground 
support equipment and 
personnel 

Equipment value, 
volume, weight 
and number of 
personnel 

d 

L.3 Increase 
availability/ decrease 
unscheduled 
maintenance 

Mean time in 
service 

P 

L.4 Reduce labor 

Labor-hours 

d, p 

L.5 Reduce periodic 
inspections 

Frequency of 

periodic 

inspections 

P 

L.6 Predict remaining 
useful life in 
components, maximize 
component life usage 
and tracking 

Accuracy in 
prediction, 
minimize false 
alarms 

P 

L.7 CBM - Schedule 
regular maintenance 
only as necessary - 
Predict remaining 
useful life in 
expendables (e.g. oil) 

Prediction 

accuracy 

P 

L.8 Ease of using 
entire information 
system 

Measure of 
integration and 
information 
access: data 
access, security, 
search, increase IS 
availability, 
decrease costs... 

d, p 

L.9 Increase surge 
capacities 

Surge capacity 

d, p 

L.10 Reduce costs of 
reconfigurations and 
turn-arounds 

Total $ spent on 
reconfigurations 

d, p 

L. 1 1 Maximize vendor 
lead time 

Lead time 

P 

L.12 Minimize 
inventory (just in time) 

Measure of 
inventory (dollars, 
parts), rate of 
spare parts usage 

d, p 
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4.1 Logistics 

DEFINITION: Logistics is the science of planning and 
executing the acquisition, movement and maintenance 
of resources necessary to sustain aeronautical 
operations. 

The bottom line for logistics is to make operations 
faster, cheaper (less stuff, less personnel) and more 
consistent and reliable (less uncertainty and more 
predictable). This top floor view of logistics can be 
translated into the user objectives and associated 
metrics listed in Table 1. All of the user objectives 
tables presented will have the rightmost column 
indicating whether the performance metric can be 
mapped into diagnostics (d), prognostics (p), both or 
neither. 

In this table are objectives that would exist even 
without any health management solution such as 
reducing turn-around and repair times. Hopefully, 
these can be improved through the appropriate 
application of health management information. The 
issue of reducing ground support also exists whether or 
not we have a health management system. The central 
concept is that the diagnostic (fault type and location) 
information available will reduce the need for extensive 
ground test equipment and will reduce the time spent 
on facilitating repairs as well. 

Reducing the frequency of periodic inspections by 
relying upon more extensive system monitoring is 
starting to become a reality in the Air Force [16]. The 
individual aircraft tracking program (IAT) enables the 
development of an individualized aircraft specific 
maintenance schedule (including inspections) based on 
actual fatigue loads and/or crack lengths for each 
aircraft. 

Without 1VHM, consumables (such as oil) are replaced 
at a fixed schedule based upon expected usage. 
Condition based maintenance (CBM) [6, 7] has started 
using the operating regime to modify this replacement 
schedule and the inspection intervals. Heavy use will 
result in more frequent inspections and vice versa. 
Additionally, the actual condition of the 
consumable/expendable can be monitored either 
directly or indirectly based upon operating conditions. 
The rate of deterioration can be estimated and then the 
optimal replacement schedule predicted so that the 
operator can be notified in advance. This type of 
technology enables logistics to schedule service in 
advance at an optimal replacement schedule. 

A final point on logistics is the user objective for ease 
of use of the entire information system (IS). This 
includes ensuring that the appropriate people/teams 
have access to the appropriate information at the right 
time with sufficient data integrity and security. 
Unfortunately, many times the information system is 
thought of after the fact as merely a way to archive 
records. This lack of integration has been identified as 
a large reason for failure [27]. It should be noted that 


measuring “ease of use” for an entire IS is very difficult 
and subject to multiple, sometimes conflicting 
ideologies. Many measures associated with evaluating 
the usability of an enterprise system are subjective. 

The Air Force has set the objective of modernizing the 
information systems that underlie its logistics with the 
goal to increase IS equipment availability by 20% and 
reduce annual operational expenses by 10% [28]. 

The objectives and metrics associated with an 
information system that spans all aspects of aviation 
operations are far beyond the scope of this article. 
However, we will highlight some of the key aspects 
with respect to health management and how a user 
might assess: 

• asset tracking 

• individual aircraft condition assessment 

• demand management 

• lifecycle product management 

• integrated planning system 

• purchasing supply chain management 

• fleet decision management tools 

The Joint Strike Fighter program is developing 
autonomic logistics information system tools to 
integrate management systems (e.g. fleet & 

maintenance) along with knowledge discovery tools 
and anomaly and failure resolution systems. Since the 
IS is responsible for enabling real-time information 
flow between maintenance, training, supply and 
mission planners as well as to provide data for 
performance analytics it can be considered the 
backbone of logistics [10], 

In the past, such large scale integrated IS 
implementations have failed for a number of reasons 
such as poor understanding of the requirements, 
immature products, limited testing in actual 
environments and under appreciating and under valuing 
the effort required for data cleanup [28]. Typically 
data useful for analytical modeling is contained in 
multiple heterogeneous systems. [8]. 

One difficulty of accurate maintenance data collection 
is more than just an information system issue - humans 
are the ones performing the maintenance actions and 
entering the maintenance data into the information 
system. In the past, the maintenance codes provided to 
maintenance technicians in both military and civilian 
sectors were rather coarse grained to enable easier entry 
during maintenance. This meant that during 
unscheduled maintenance debugging activities, there 
could be inaccuracies generated either from entering 
the closest (or most familiar) maintenance code or 
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entering the wrong premature diagnosis. For example, 
electrical wiring in the past was not considered as a 
separate system but rather was just the thing between 
reportable sub-systems. This meant that wiring 
problems were often under reported within the 
maintenance database. This has been addressed by 
adding additional maintenance codes and making the 
definitions more precise with the adverse consequences 
of requiring even more labor and costing more time for 
maintenance technicians. 

Another aspect to ensuring the utility of the information 
system is through the use of common architectures, 
interoperability metrics, common standards and a clear 
path to implementation [29]. The U.S. Department of 
Defense (DoD) Architectural Framework (DoDAF) 
defines a standard way to organize an enterprise 
architecture (EA) into consistent views [30]. Other 
approaches [8] include ontological interchange 
standards KIF [31], product data oriented standards 
such as STEP [32], and even more specific diagnostic 
information models such as AI-ESTATE [33]. 

In a similar vein, IEEE is also developing standards 
such as the Automatic Test Markup Language (ATML) 
[34] and the Software Interface for Maintenance 
Information Collection and Analysis (SIMICA) [35] 
[36] as a means to standardize the exchange of test 
information between automatic test equipment. 

From a lessons learned perspective on the JSF program, 
a well integrated information system has been 
identified as the most important lesson learned [27]. 
This lesson includes ensuring that ground systems are 
developed jointly with diagnostic systems and that on- 
board diagnostic algorithms are developed in a manner 
to ensure full system capability. 

There is a great difference between supply chain 
management for a large operation consisting of a 
uniform fleet and managing a very small number of 
highly unique and complex vehicles (such as NASA’s 
Shuttle Orbiter program). With a small number of 
vehicles requiring custom part specifications, the lead 
time to the vendors needs to be maximized, and having 
an inventory of such spare parts is advisable. In the 
case of large fleets where multiple sources are available 
for parts and supplies, a just in time inventory approach 
can help minimize waste and storage expenses. Turn- 
around time can be optimized through proper planning 
and use of analytical and prediction capabilities of 
fleets. 

4.2 Flight 

DEFINITION: The Flight category includes the pilots 
and flight crew as well as those responsible for Safety 
of Flight (SOF). 

The bottom line for flight objectives for health 
management systems is to only provide information 
that increases certainty for future actions and 
commands and increases safety of flight. 


Table 2 list objectives related to flight. A clear 
violation of the information certainty objective is false 
alarms - alerting the crew to a problem in a subsystem 
when the problem does not really exist. 

A second objective, also related to reducing uncertainty 
in the cockpit, is the objective not to have conflicting 
alarms - also known as dissonance [37], [38]. This 
objective can unfortunately be derived from a lessons 
learned from a tragic flight accident. In July 2002 a 
mid-air collision occurred between a Russian passenger 
jet and a DHL cargo jet over Germany which resulted 
in 71 deaths. Analysis of this accident revealed a 
dissonance problem between an on-board alerting 
system called the Traffic Alert and Collision Avoidance 
System (TCAS) and an air traffic controller 
(http://aviation-safety.net) whereby the TCAS (Traffic 
Collision Avoidance System) commanded the pilot to 
gain altitude to avoid a collision and the control tower 
commanded a decrease in altitude. The conflicting 
signals, even if the pilot can prioritize, cause time 
delays in executing the appropriate action. 


Table 2. Flight User Goals and Metrics 


Flight Goals 

User Metrics 

Map 

F.l Minimize 
cockpit false alarm 
rate 

Time between false 
alarms 

d,p 

F.2 Minimize 
cockpit information 
overload 

# health 

management 

messages 

d, p 

F.3 Enable cockpit 
information filtering 
of critical alarms 

Capability to filter - 
pilot satisfaction 

d, p 

F.4 Enable cockpit 
information filtering 
of non-critical 
alarms 

Capability to filter - 
pilot satisfaction 

d,p 

F.5 Minimize alarm 
conflicts 

# conflicting alarms 

d,p 

F.6 Minimize alarm 
dissonance 

# alarms that have 
disparity between 
ATC and alarms 

d,p 

F.7 Maximize time 
from first alert to 
failure. 

Time to failure or 
when safe landing 
becomes difficult. 

d,p 

F.8 Enhance Safety 

# aborted flights 

d, p 

F.9 Enhance Safety 

# smoke events 

d,p 

F.10 Enhance Safety 

Passenger comfort 
complaint rate 

d,p 


The flight crew also needs to have as much advanced 
knowledge of an imminent failure as practical [39]. In 
particular, pilots need to be alerted early enough that 
the fault can be resolved and control regained (if lost) 
or if the handling qualities are too severely degraded, 
the health management system should be able to 
augment vehicle control stability in conjunction with a 
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damage adaptive controller to enable a safe emergency 
landing. 

The ability for crew to prioritize, although essential, is 
one that is easily overloaded when either too much 
information is presented or when the most critical 
information is either buried beneath layers of 
information or is not easily accessible (multiple sub- 
menus). This relates to both optimizing the number of 
health management messages sent to the crew as well 
as allowing for the crew to appropriately filter the less 
critical messages. 

There is a lack of understanding in the community as to 
“how good is good enough” and “how good can we 
get” with respect to fault diagnosis. This is intimately 
connected with practical issues such as performance 
metrics and false alarm rates. In the past on-board 
diagnostic systems have had a terrible record for 
costing more then was saved. For example, in [40] it is 
documented that built-in-tests (BIT) caused wasted 
(CND) maintenance of the order of 85,639 maintenance 
man hours and 25,881 hours unnecessary aircraft 
downtime. This issue has plagued the F/A-18E/F and 
the V-22 Osprey [41], 

Of course the top priority of the flight crew is safety. 
Typically safety can be measured in terms of the 
number of aborted flights, number of National 
Transportation Safety Board (NTSB) incident and 
accident reports, number of smoke events (when the 
smell or sight of smoke is present), and number of 
passenger comfort complaints (air quality, water 
quality, temperature. . .). 

As the safety of air transportation continues to improve, 
the impact of health management systems on safety 
becomes increasingly difficult to measure. 
Nevertheless, the introduction of health management 
technology should always be required to improve 
safety. There is always risk from the introduction of 
technology that needs to be weighed and mitigated so 
that safety margins are always improving. 

4.3 Maintenance 

DEFINITION: Maintenance health management users 
are defined as the personnel in the depots and on the 
field responsible for repairing and servicing the aircraft. 

The bottom line for maintenance is to as quickly and as 
inexpensively as possible return an aircraft to service 
while minimizing repeated repairs. 

The most costly and most time consuming type of 
faults are intermittent faults seen during flight that 
cannot be duplicated (CND) in the maintenance depot. 
These faults may not be discovered by static depot 
tests. The dynamic environment of flight may cause 
faults which only manifest in flight. These types of 
faults result in subsystems (e.g. Line Replaceable Units 
- LRUs) being pulled for testing unnecessarily resulting 
in excessive inventory of parts that retest OK (ROK), 


excessive time spent on testing and trying to diagnose 
LRUs that actually are not faulty, and test flights trying 
to pin down the correct diagnosis. Health management 
systems hold the allure that a correct diagnosis (fault 
type and location) can be provided without intervention 
by the maintenance personnel. This would both reduce 
the incidents of CNDs and RTOKs as well as reduce 
the required labor. Table 3 contains a sampling of the 
maintenance objectives. 


Table 3. Maintenance User Goals and Metrics 


Maintenance Goals 

User Metrics 

Map 

M. 1 Decrease incidents 
of cannot duplicate 
(CND) logs and retests 
OK (RTOK) 

# CNDs 

d 

M.2 Reduce failures 

MTBF 

P 

M.3 Increase operation 
after non-critical faults 

Time of 
operation after 
non-critical fault 

P 

M.4 Reduce damage 
incurred 

# damage 
incidents logged 
as caused by 
maintenance 

P 

M.5 Reduce maintenance 
look-up time 

Time to access 
maintenance 
manuals and 
records 

d 

M.6 Identify fault 
location 

distance to fault 
in wiring, LRC 
identification 

d 

M.7 Reduce health 
management system 
maintenance 

Hours spent on 
diagnosing and 
repairing the 
health 

management 

system 

d, p 

M.8. Maximize fault 
coverage 

Percentage of 
detectable faults 

d 


One of the greatest sources of faults for Electrical 
Wiring and Interconnect Systems (EWIS) comes from 
poor maintenance practices [42]. For example, if a new 
wire needs to be run, rather than unscrew the wire 
clamps and undo the wire ties along the harness it is 
quicker to just push the wire through the clamps and 
ties if it will fit. This can have the consequence of 
causing the wire clamps to be too tight resulting in 
pinching of all of the wires. Over time this can result in 
abrasion and breakage internal to a wire. Although this 
problem does not manifest right away, bad practices 
such as this can reduce the average fleet mean-time 
between failure (MTBF) values. 

A prognostic capability within a health management 
system provides the capability to predict and trend 
degradation before eventual failure occurs. The ability 
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for maintenance to reduce subsystem failures by repair 
and/or replacement prior to failure can be measured in 
terms of mean time between failures. In the case of the 
electrical wiring issue, a future electrical diagnostic 
system could sense the abnormal wear within the 
pinched wires. The prognostic system could then form 
an estimate as to when the stressed wires would need to 
be replaced to avoid interruption in service. 

An ISHM system also holds the promise to reduce 
maintenance turn-round time by identifying the 
location of the fault. For discrete state based systems, 
the fault coverage can be extensive and enumerated. 
Fault coverage for analog parameters is much more 
difficult to ensure than with discrete domains. 
Typically due to the continuous nature of the range of 
parametric faults along with the inherent masking effect 
of process variations there tends to be a range of faults 
in which are not entirely detectable. This grows worse 
as variance increases. Two novel test metrics are 
introduced in [43]: a guaranteed parameter fault 
coverage (GPFC) obtained by a deterministic method, 
which is the guaranteed lower bound of PFC, and a 
partial parameter fault coverage (PPFC), which is the 
probabilistic component of PFC. The details of these 
metrics can be found in [43]. 


Table 4. Fleet Management Goals and Metrics 


Fleet Management 
Goals 

User Metrics 

Map 

FM. 1 Life extension - in 

Y ears past 

P 

service beyond expected 
service life 

retirement 


FM.2 Decrease 

Hours of 

P 

unscheduled maintenance 

unscheduled 

maintenance 


FM.3 Easily 

Time to 


reconfigurable 

respond to 

mission 

change 


FM.4 Efficiency 

Fuel used vs. 
cargo/people 
transported 

d, p 

FM.5 Vehicle targeted 

(HUMS 

d,p 

CBM 

examples) 


FM.6 Decrease ops costs 

operating 

P 

(RMO) 

expenses 


FM.7 Increase availability 

mean turn- 
around time 

d, p 

FM.8 Provide surge 
capacity 

surge capacity 

P 

FM.9 Spare part usage 

Percent 

P 

analytics. 

accuracy on 
part usage 
predictions. 


FM.10 Aid business and 
regulatory decisions 


d,p 

FM. 1 1 Improve design 
and qualifications 


d,p 


4.4 Fleet Management 

DEFINITION: Fleet management health management 
users are defined as those involved with making fleet 
wide decisions affecting life extension, operational 
costs (RMO) and future planning. 

The bottom line for fleet management is to maximize 
adaptability, availability and mission success while 
minimizing costs and resource usage. 

Fleet managers interact with the health management 
system at a higher level of abstraction than do the other 
users. The accuracy of the analytics and system 
assessments is even more critical at this level due to the 
large consequence of a single misinformed decision. 
Since fleet management is at such a high level it 
encompasses the users that we have previously 
examined such as logistics, flight and maintenance. 
Table 4 summaries the objectives of fleet management. 

Integral to fleet management is the use of decision 
support systems within an integrated information 
system. Decision support systems aid business and 
regulator decisions and improve design and 
qualifications by emphasizing specific query, reporting 
and analysis capabilities [44]. This is used both by a 
fleet owner and operator as well as by original 
equipment manufacturers (OEMs) (e.g. warranty 
calculations). This also impacts regulatory affairs by 
allowing fleet managers to have the information 
necessary to adhere to strict regulatory inspection 
intervals and minimize fleet wide disruptions. 

5. DIAGNOSTIC & PROGNOSTIC SYSTEMS 

The health management system user objectives and 
metrics will next be related to those metrics associated 
with development and operation of diagnostic and 
prognostic systems. Many of these user objectives will 
map into both diagnostic and prognostic metrics, others 
will not map into either. An extensive survey on 
diagnostic metrics was conducted in [1]. The primary 
results from this survey are presented in Table 5. 
Surveys on prognostic metrics, including suggestions as 
to new metrics for prognostics are presented in [2] and 
[3]. Readers wishing for more insight into performance 
measures for diagnostics and prognostics are directed to 
look at [1], [2] and [3] and the references contained 
therein. 

5.1 Diagnostics 

DEFINITION: Diagnosis is the detection and 

determination of the root cause of a symptom. 

The bottom line for diagnostics is to detect and isolate 
faults in a timely and accurate manner with sufficient 
resolution so as to identify the specific faulty 
component. 


9 
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The objectives and associated metrics for diagnostics 
taken from [1] are summarized in Table 5. The 
diagnostic objectives have been categorized into two 
categories: detect and isolate. Within each of these 
categories are objectives related to response time, 
accuracy, sensitivity/resolution and robustness. The 
previously presented user objectives for logistics, 
flight, maintenance, fleet management and training can 
be related to the diagnosis objectives and metrics in 
Table 5. A summary of this mapping of user goals to 
diagnostics is summarized in Table 6. The purpose of 
this mapping is to present the relationship between 
published user objectives and the performance 
measures used to drive diagnostic algorithm research 
and development. 


Table 5. Diagnostic Metrics [1] 


Type 

Diagnostic 

Objectives 

Model Metrics 


Time 

Response time to detect 


Accuracy 

Detection false positive rate 

(J 

qj 

Accuracy 

Detection false negative rate 

+* 

0> 

Accuracy 

Fault detection rate 

o 

Accuracy 

Fault detection accuracy 


Sensitivity 

Detection sensitivity factor 


Stability 

Detection stability factor 


Time 

Time to isolate 


Time 

Time to estimate 

QJ 

Accuracy 

Isolation classification rate 

a 

o 

Accuracy 

Isolation misclassification 

Xgi 


rate 


Resolution 

Size of isolation set 


Stability 

Isolation stability factor 


In order to make this table presentable, we have 
selected the most important diagnostic measure(s) for 
each objective. This ignores many of the points that 
have been made within this article and should only be 
considered within that context. For example with 
MTTR, we have listed accuracy and specificity - this is 
not to say that timeliness is not important - timeliness is 
essential as has been pointed out within our discussion. 

Note that there are several categories that have been 
listed as not defined. For example, the ease of using an 
IS is not clearly defined within the diagnostics 
development community. One of the user objectives: 
minimizing alarm dissonance - requires a more systems 
level approach than can be provided by listing a single 
diagnostic. 

1) Diagnostics for logistics 

All of the measures in Table 5 directly or indirectly 
impact some of the previously described user metrics. 
The logistics user goals and metrics (Table 1) relevant 
to diagnostics are related to the appropriate diagnostic 
metrics. 


Reduce repair turn-around time (L.l) - The user goal of 
reducing repair turn-around time as measured by the 
mean time to repair can be facilitated via maximizing 
the accuracy of fault detection and isolation. An 
automated diagnostic system that can pin-point the fault 
type and faulty sub-system component will save 
technicians time in locating the root cause of the fault 
symptom. The reduction in repair turn-around time 
corresponds to the ratio between the time spent 
diagnosing with respect to the total time of diagnosing 
and repair. Conversely, a bad diagnosis system will 
mislead repair personnel and potentially adversely 
impact the repair turn-around time. Occasionally such 
misdirections will occur, it is important to evaluate the 
mean of the reduction in repair turn-around. If the 
deviation is too high about this mean the repair 
personnel may stop using the system out of frustration. 

Reduce ground support equipment and personnel (L.l) 
- The goal of reducing ground support/footprint as 
measured by the number of ground support personnel 
and also by the amount of equipment required to 
diagnose a fault is related to all the entries of Table 5. 
If the diagnosis system is quick enough to transmit 
logistics requests prior to landing, and if the diagnosis 
is accurate and has high enough specificity (resolution), 
then right test equipment at the right time may be made 
available via on-board diagnostics telecasting the 
appropriate information to maintenance and logistics. 

Reduce labor (L.4) - Reducing labor as measured in 
aggregate labor hours is enabled by ensuring accurate 
detection and isolation diagnosis as well as a timely 
solution. If the detection and isolation algorithms take 
longer to find a solution than the nominal labor 
required to discover root cause, then the system is a 
failure. Additionally, if the diagnosis or isolation is 
wrong too many times, technicians will spend more 
time to enact repairs and will eventually terminate 
usage of the diagnostic system. 

An independent technical assessment of software for 
the F-22 determined that the acquisition activity failed 
to properly interpret and implement fault detection and 
fault isolation requirements [45]. In particular, the 
following software capability gaps in the integrated 
diagnostics were highlighted: 

1. Test coverage 

2. Correlating faults to failures 

a. ability to isolate failures 

b. determining the consequence of a failure 

3. Fraction of false alarms / false positives 

4. Software health management - diagnostic 
environments that monitor software for faults are 
immature. 
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Table 6. Diagnostic Mapping Summary 


User Community 
Goals/Metrics 

Diagnostic Saliency 

Logistics 

Max. accuracy & 

Min. MTTR 

specificity 

Min. ground support 

Max. specificity 

Min. labor hours 

Max. accuracy & 
specificity 

Ease of use of IS 

Not defined 

Minimize inventory 

Max. accuracy & 
isolation 

Flight 

Min. false alarms 

Max. accuracy 

Min. info overload 

Max. accuracy & 
specificity 

Enable info filtering 

Max. specificity 

Min. alarm conflicts 

Max. accuracy 

Min. alarm dissonance 

System level issue 

Max. alert time from 
failure 

Timeliness 

Max. safety 

All 

Maintenance 

Min. CND & RTOK 

Accuracy & isolation 

Reduce look-up time 

Not defined 

Accurate fault location 

Max. isolation and 
distance to fault 

Min. HMS maintenance 

Not defined 

Max. fault coverage 

Max. coverage 

Fleet 

Accuracy & 

Max. efficiency 

specificity 

Vehicle CBM 

Accuracy 

Aid business decisions 

Not defined 

Improve design 

Not defined 


Test coverage refers to how many of the physical 
system failure modes are included within the scope of 
the diagnosis algorithms. The size of the isolation set 
(Table 5) refers to how many modes within the model 
scope are reported in a candidate set (size of the 
ambiguity group). 

Ease of using entire information system (L.8) - The 
ease of use of the information systems associated with 
all aspects of the life-cycle is very difficult to measure 
and has many different meanings. For our purposes, 
we will relate this to the diagnostic objective of 
minimizing time to respond. It is very difficult and 
subjective to measure the performance of an 
information system from user perspectives. For 
example, the information needs, access rights and even 
processing operations vary greatly from logistics, to 
maintenance and fleet management. Fleet managers 
may need annualized aggregated statistics whereas 
maintenance personnel need access to an individual 
vehicle’s repair history and to OEM part replacement 
procedures. 


Minimize inventory (just in time) (L.12) - One of the 
ways to reduce the need for a large inventory of spare 
parts is to have a method by which repairs are initiated 
such that only those parts which need replacement are 
swapped out. Often times during a diagnostic 
procedure, a technician will need to swap out parts to 
try to localize the root cause of the fault. With a 
diagnostic system capable of accurate fault isolation 
this behavior of part swapping can be reduced thus 
impacting the inventory metric. Prognostics can have 
an even greater impact on minimizing required 
inventory by predicting wear trends. 

2) Diagnostics for flight 

Automated diagnostics for the flight crew has more 
critical factors with respect to timeliness of reporting 
than logistics requires. The crew needs enough time to 
be able to either resolve the fault condition or to 
respond and plan for an emergency landing. Another 
metric within Table 5 that pertains to diagnostics for 
the flight crew is the measure of the number of false 
alarms. 

Minimize cockpit false alarm rate (F.l) - The 
minimization of cockpit false alarms as measured by 
the time between false alarms is obviously mapped 
directly to the detection false positive rates. The metric 
of time between false alarms is not necessarily the 
optimal measure, not all false alarms will be treated 
equally by the crew. There is a measure of criticality 
that needs to be added to this metric. 

Minimize cockpit information overload (F.2) 
Information overload can cause crew to miss critical 
messages as well as to create patterns of behavior 
whereby ignoring messages is rewarded due to 
misinformation. In part this can be alleviated by 
improving the accuracy and specificity of the provided 
diagnostic information. Additionally, since different 
user preferences will prevail, there needs to be 
information filtering capabilities. 

Enable cockpit information filtering of critical alarms 
as measured by pilot’s satisfaction (F.3, F.4) - The 
capability to filter critical cockpit alarms can be 
measured by surveying pilot satisfaction. Whenever a 
metric involves measuring human satisfaction, the 
complexities can be enormous. The ability to filter 
messages can be considered independent from the 
diagnostic system as long as inaccuracies are mitigated. 
It is often the case that human factor issues are not 
adequately considered when diagnostics are developed 
at the sub-system level. These human centric issues 
become more apparent at a systems integration level. 

Minimize alarm conflicts as measured by number of 
conflicting alarms and minimize alarm dissonance as 
measured by number of alarms that have disparity (F.5, 
F.6) - The number of conflicting alarms and the number 
of alarms that have disparity can be indicators of 
overall system integration. Many times diagnostics are 
developed independently for sub-systems by different 
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vendors and then the central diagnostic system is 
responsible for reconciliation of all of those systems. 
The conflicts definitely arise from the error statistics of 
the individual sub-systems but there is a higher level of 
functionality that is not represented individually. The 
performance of minimizing conflicts can be measured 
by the accuracy and resolution of the integrated system. 
Alarm conflicts may involve dissonant information 
creating conflict between the control tower and the 
advisory cockpit warnings. 

Even in the absence of control tower communications, 
cognitive dissonance resulting from alarms may cause a 
loss of situational awareness among the crew members 
and lead to incorrect actions being taken. This level of 
system integration is typically beyond scope of the 
majority of diagnostic and prognostic engineers. 

Maximize time from first alert to failure as measured by 
time to failure or when landing becomes difficult (F. 7) - 
Maximizing the in-flight timeliness of a diagnostic is 
critical to both giving the flight crew adequate time to 
plan and respond as well as giving the ground logistics 
time to implement a maintenance plan. Typically there 
is a trade-off between early detection and false alarms. 
It is frequently the case that early detection can only be 
made when more false alarms are allowed to be 
incurred. This trade-space needs to be weighed 
carefully with respect to the criticality of the failure and 
the amount of time really required to prepare for a safe 
landing. 

Safety as measured by number of incidents (e.g. smoke 
events) or number of aborted flights (F.8-F.10). - 
Obviously underlying all improvements in all of the 
other categories is the need to always be maintaining or 
improving safety margins. All aspects of diagnostics 
relate to safety. 

3) Diagnostics for maintenance 

Decrease incidents of cannot duplicate (CND) logs and 
retests OK (RTOK) (M.l) - The maintenance health 
management users metric for the number of CND logs 
will be positively impacted by accurate fault detection 
and isolation. 

Reduce maintenance look-up time (M.5) - Legacy 
systems can make even the simplest task take 
considerable time. For example, repairing a broken 
sensor wire requires that the maintenance personnel be 
able to lookup that particular sub-system in the OEM 
manuals to determine the wire type, correct size and 
routing. This information can be buried in obscure 
encodings and difficult to use manuals that are not 
readily accessible electronically in the maintenance 
bay. As diagnostic systems become more 
sophisticated, it is important that they make the 
necessary information immediately available to those 
personnel that will facilitate the repair. 

Fault location (M.6) - Fault location is a bit trickier to 
map directly to Table 5 which lists fault isolation. 


Fault isolation in some sense implies more a discrete 
state-space approach. There are certainly subsystems 
such as electrical wiring, wherein both fault isolation 
and fault localization are different. For example, fault 
isolation determines which wire or wire bundle (or 
connector) is responsible for the given fault symptoms; 
whereas fault localization specifies the precise location 
(distance to fault) of the damage on the wire 
responsible for the fault. This will become an 
increasingly important distinction as arc fault circuit 
breakers come into operation. An arc fault circuit 
breaker is designed to interrupt the circuit once an 
arcing condition has been detected. Unfortunately, by 
the time arcing has been detected, there will be damage 
present on at least one wire. This damage will typically 
be just a small spot (a consequence of an effective 
breaker) and may be very difficult to find via visual 
inspection without location information. 

Health management system maintenance (M. 7) - 
Another aspect that is unique to diagnosis is the 
maintenance required to maintain the health of the 
diagnostic health management system. Although this 
does not appear in Table 5, the maintenance objectives 
for the diagnostic health management system need to 
be one of the factors within the model metrics. It is 
important that such issues as sensor fatigue/failure be 
diagnosed appropriately rather than misclassified as a 
fault with the system that the sensor(s) is measuring. 
Although time and money savings will be incurred 
through a healthy health management system, if the 
maintenance of the HMS consumes all of these savings 
then a net result has been to increase risk to safe 
operation of the vehicle. Another application where 
fault localization is of great importance is structural 
health management. 

Fault coverage (M.8) - Fault coverage for discrete fault 
states indicates the percentage of faults that the 
diagnosis system is able to detect and diagnose. It is 
important that the fault coverage includes the health 
management system itself so that technicians are better 
able to direct their attention to the appropriate sub- 
system. For continuous fault states, the coverage 
indicates the ability to detect faults within acceptable 
limits. Fault coverage is impacted by the resolution of 
the diagnostic system. A system that has broad 
coverage but is not able to localize will not have much 
of an impact on turn-around time. This is also related 
to the isolation set which determines the resolution of 
the diagnoses. 

4) Diagnostics for fleet management 

Diagnostics for fleet management has the potential to 
reduce the number of maintenance hours and thereby 
positively impact the user metrics of mean turn-around 
time and hours of unscheduled maintenance, although 
the number of maintenance activities will not likely 
decrease. Ultimately, the other fleet user objectives 
and additionally the unscheduled maintenance metric 
will be impacted by an effective prognostic system. 
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Efficiency (FM.4) - All systems on a vehicle may be 
running within nominal operating ranges but peak 
efficiency may not be achieved when some systems are 
near the edge of nominal behavior. The ability to trend 
these in prognostics will have the greatest impact on 
improving and maintaining optimal performance 
efficiencies. 

Vehicle targeted CBM (FM.5) - Condition based 
maintenance, with sufficient information technology 
infrastructure, can be targeted to individual vehicles 
making it possible to optimally maintain a vehicle 
based upon its history as well as operating context. An 
accurate and specific diagnosis system integrated 
within a larger information system can enable vehicle 
targeted CBM. 

Increase availability (FM.7) - Diagnostics can aid in 
increasing average fleet availability by minimizing the 
mean time to repair (by providing accurate diagnoses). 
Prognostics will have an even greater impact by 
minimizing the down time attributable to unscheduled 
maintenance fleetwide. 

Aid business and regulatory decisions (FM.10) - Well 
thought out system integration is essential for 
diagnostics to be able to impact business decisions. For 
example as a fleet ages, vehicles start to exceed the 
original expected life, failures may start to be 
diagnosed in a few vehicles that are both the source of 
unscheduled maintenance as well as indicative of a bad 
trend. These fleet wide diagnosis trends can be 
analyzed to determine when is the optimal time to 
schedule replacement of parts in the non-failed part of 
the fleet prior to failure but without prognostics or 
trending degradation. 

Improve design and qualifications (FM.ll) - As parts 
are diagnosed as failing, there may be fleet wide 
occurrences of component failures that were not 
expected by the engineers. A diagnostic system that is 
well integrated into a fleet wide information system can 
alert engineering that an analysis needs to be performed 
to determine if these components will continue to fail at 
an unexpected rate thus warranting a design 
improvement. 

5.2 Prognostics 

DEFINITION: Prognostics is defined as the ability to 
detect, isolate and diagnose mechanical and electrical 
faults in components as well as predict and trend the 
accurate remaining useful life (RUL) of those 
components [9]. 

The bottom line for prognostics is to as accurately and 
as far in advance as possible predict the remaining 
useful life of components and consumables to aid in 
logistics management, maintenance planning, crew 
alerting (impending failure) and fleet-wide planning. 
From a maintenance perspective: 


“The goal of the prognostics portion of PHM is to 
detect the early onset of failure conditions, monitor 
them until just prior to failure, and inform maintenance 
of impending failures with enough time to plan for 
them. This will, in effect, eliminate the need for many 
of the inspections, as well as provide enough of a lead 
time to schedule the maintenance at a convenient time 
and to order spare parts in advance. ”[27] 

The prognostics discussion will for the most part not 
overlap with the previous discussion of diagnostics 
even though many of the points are strongly inter- 
related. Specifically, if a system cannot reliably detect 
a fault useful for diagnosis it will prove very difficult to 
accurately assess the remaining useful life of such a 
component. For diagnosis (Table 5) we broke the field 
into two types: detect and isolate. For each of these 
types we had measures for time, accuracy, 
sensitivity/resolution and stability. The model metrics 
for prognostics are taken from two overviews of 
performance metrics: [2] and [3] as summarized in 
Table 7. 

The detect category for prognosis has a different 
meaning from detect in diagnosis. As an example, 
consider the meaning of false positive for each. A false 
positive in diagnosis detection means that the diagnosis 
system detected and indicated a fault where none 
existed. Flowever, a false positive in prognosis means 
that a prediction of failure is unacceptably early 
resulting in loss of usable service life. Thus, prognosis 
detection is with respect to a time horizon which 
depends on user requirements. Typically the notion of 
detection in diagnosis is not relative to a time horizon. 
For prognosis we have added two more types to detect 
and isolate: predict and effectivity. Similar to the 
diagnosis table, within these categories are objectives 
related to accuracy, time, sensitivity and effectiveness. 

The last type: effectivity relates very much to 
engineering design trade-space. Designing a system to 
make diagnosis and prognosis easier is an extensive 
subject that is beyond the scope of this paper. 
However, the effectivity section is very much related to 
the cost benefits analysis discussion. 

The metrics employed in prognostic algorithm research 
and development are shown in Table 7 as taken from 
[2] and [3], The mapping between the user goals and 
the prognostic metrics are listed in Table 8. As was 
true with the mapping from user objectives to 
diagnosis, the purpose of this mapping is to present the 
relationship between published user objectives and the 
performance measures used to drive prognostic 
algorithm research and development. We will not 
repeat elements covered in the diagnosis discussion, we 
chose to highlight items specific to prognosis. The 
considerations and caveats implored for Table 6 apply 
to Table 8 as well. Higher level functions such as 
business analytics and decision support systems which 
are functions of prognostics have not been defined 
within the prognostics measures. 
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1) Prognostics for logistics 

The logistics goals unique to prognosis from diagnosis 
are discussed hence. 

Increase availability /decrease unscheduled 

maintenance (L.3) - Decreasing unscheduled 

maintenance (and therefore increasing availability) is 
directly enabled through accurate degradation trend 
prediction combined with adequate time horizons. The 
time horizon needs to be long enough to allow for 
proper scheduling of maintenance and logistics as well 
as to plan for the usage of replacement aircraft. 
Obviously, accuracy in prediction of remaining useful 
life is critical to not waste part life through premature 
replacement. Incorrect estimates are even worse in that 
they will result in replacement of good parts and 
unnecessary downtime or even worse - failed parts that 
would have otherwise been replaced before failure. 
The positive impact on availability of prognostics is 
great but the risk posed by inaccurate prognostics is 
equally great. 

Reduce periodic inspections (L.5) - The U.S. Air Force 
is employing condition based maintenance techniques 
combined with usage predictions to change the 
frequency of inspections and replacements based upon 
usage. The impact on the commercial sector with such 
technology depends largely on regulatory affairs 
(FAA). The current regulations need to take into 
account the ability to monitor and predict degradation 
trends as a means to reduce periodic inspections. 
Accuracy of the predictions of remaining useful is 
essential to avoid replacing parts that still have good 
life left and to avoid unscheduled maintenance due to 
unpredicted failures. 

Predict remaining useful life in components, maximize 
component life usage and tracking (L.6) - Maximizing 
component life usage means having the ability to 
accurately know when an isolated component will fail 
with enough lead time so as to be able to schedule 
replacement. Obviously this relies upon accuracy of 
predictions as well as having an adequate time horizon 
and being able to isolate trends to specific components. 

CBM - Schedule regular maintenance only as 
necessary - Predict remaining useful life in 
expendables (e.g., oil) (L. 7) - One of the first 
applications of prognostics has been in assessing the 
state of consumables such as oil. Oil can be monitored 
for its quality, for contaminants, and for quantity. The 
trend of the degradation of the oil can then be predicted 
and used to optimally schedule maintenance for 
replacement/renewal. 

Provide surge capacity (L.9) - The ability to delay or 
adjust maintenance windows provides the capability of 
supporting surges in operations. Accurate health 
predictions aid in understanding the limits to possible 
delays and adjustments. 


Table 7. Prognostic Model Metrics [2] and [3] 


Type 

Prognostic 

Objectives 

Model Metrics 


Accuracy of 

Early prediction, 


characterization 

late prediction 

Z J 


(with respect to 

Qi 

■w 

4) 


time window) 

Q 

Missed estimation 

# missed 


rate 

detections/total # 



prognoses 


Accuracy of predict 

Accuracy at 


remaining useful life 

specific times 
(error, average 
eiTor) 


Minimize sensitivity 

Sampling rate 
robustness 

-W 

Precision 

Ratio of precision 

ZJ 

-3 


to horizon length, 

0) 

u 


standard deviation 

Om 

Flit rate 

# correct 
prognoses/total # 
of prognoses 


Timeliness 

Prognostic horizon, 
accuracy at specific 
times, convergence 
rate 


Phase difference 

Anomaly 


between samples 

correlation 

QJ 

53 

and prediction. 

coefficient 

© 

Precise correct 

# correct prognoses 

HH 

estimation rate 

without adequate 
resolution 


Minimize number of 

Reduced feature set 


required sensors 

robustness 


Minimize amount of 

Data frame size 


data needed 



Prognosis effectivity 

# avoided unsched. 

ZJ 

& 


maint. events/total 

W 


# of possible 
unsched. events for 



component 


Average bias 

average wasted life 
of component 


Reduce costs of reconfigurations and turn-arounds 
(L.10) - Unusual or unanticipated maintenance 

problems can result in costly reconfigurations of the 
supply chain or interruption of typical logistics 
processes. The ability to quickly and accurately identify 
the causes of faults and predict failures results in less 
disruption to establish procedures and protocols, 
thereby saving time and money. Additionally, planned 
reconfigurations can be scheduled to incorporate 
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preventive maintenance that might not otherwise have 
occurred if not for accurate predictions of remaining 
useful life of components. 


Table 8. Prognostic Mapping Summary 


User Community 
Goals/Metrics 

Prognostic Saliency 

Logistics 

Max. mean time in 
service 

Max pred. accuracy & 
time 

Max. surge capacity 

Accuracy of predictions 

Min. freq. of inspections 

Accuracy of predictions 

Predict life remaining 

Accuracy, time, 
isolation 

CBM 

Accuracy 

Max. vendor lead time 

Accuracy and isolation 

Minimize inventory 

Accuracy and timeliness 

Flight 

Max. alert time from 
failure 

Time horizon, isolation 

Maintenance 

Reduce failures - MTBF 

Accurate prediction 

Increase op after fault 

Accurate trends and 
isolation 

Reduce damage incurred 

Accurate trending 

Min. HMS maintenance 

Effectivity 

Fleet 

Max. life extension 

Accuracy of predictions 

Min. unscheduled maint 

Accuracy & isolation 

Min. RMO costs 

Accuracy of predictions 

Spares analytics 

Not defined 

Aid business decisions 

Not defined 


Maximize vendor lead time (L.ll) - For parts that 
infrequently need to be replaced, but which require 
significant lead time for production and/or are 
expensive to maintain in inventory the ability to predict 
far in advance the trend in degradation is important. 
This relies upon have accuracy in prediction and 
isolation with enough time horizon to facilitate logistics 
part ordering. 

Minimize inventory (L.12) - Minimizing the required 
inventory by transitioning to a just-in-time inventory 
system requires both an adequate time horizon in the 
remaining useful life estimate as well as specificity so 
that there is enough time to order the correct parts. 

2) Prognostics for flight 

Although all of the elements of Table 2 have been 
covered in the discussion of diagnostics, we would like 
to discuss again one of the elements that can be 
positively impacted by prognostics. 


Maximize time from first alert to failure (F.7) -Whereas 
diagnosis is responsible for detecting fault conditions - 
hopefully prior to full failure - prognosis is responsible 
for predicting the trend in degradation resulting in an 
estimate of the remaining useful life along with 
appropriate estimates of uncertainty (confidence 
bounds). This distinction for in-flight is critical 
marking the difference between, for example, stating 
that hydraulic pump is faulty versus warning that the 
performance of the hydraulic pump is trending 
downwards but will be operational for several more 
hours. Thus an accurate estimate of remaining useful 
life provides the flight crew with more options as well 
as providing logistics and maintenance more options 
for scheduling repairs. 

3) Prognostics for maintenance 

Reduce failures (M.2) - Currently when a component or 
sub-system fails that is not expected to fail the result is 
unscheduled maintenance and downtime for the 
vehicle. The hope of prognostics is that some of these 
unscheduled maintenance activities may be mitigated 
by trend prediction of degradation. With an accurate 
modeling of the trending, estimates of remaining useful 
life can be used to facilitate repair/replacement of 
components prior to failure thus resulting in reduced 
unscheduled maintenance. The trade-off is that if the 
prognostic system is overly conservative wasted 
component life will occur. 

Increase operation after non-critical faults (M.3) - 
Some faults that have either been detected or are 
trending can be safely deferred for maintenance to 
avoid operation interruption. It is critical that a 
detected fault is highly accurate and isolated and that 
the prediction of the trend also be highly accurate. The 
liability for the operator ignoring a fault due to 
misinformation from the prognostic system is high and 
every validation and redundant verification needs to be 
enacted. 

Reduce damage incurred (M.4) - Electrical arc fault 
interruption circuit breakers are designed to augment 
traditional thermal based circuit breakers by monitoring 
for the electrical signature associated with arcing events 
and then cutting off the current flow to the arcing wire. 
The extension to this is to incorporate a chafing 
detection system to these breakers to be able to assess 
the state of wire insulation degradation with the aim of 
providing both a remaining useful life estimate as well 
as a distance to fault assessment. The ability to detect 
chafes prior to an arcing event allow for inspections 
and maintenance to be scheduled prior to damage 
occurring from arcing. 

Of course underlying all of the remaining useful life 
(RUL) estimates produced by a prognostic system are 
stochastic processes. This revelation requires that a 
probabilistic sensitivity analysis be included as part of 
the validation and verification process of new 
prognostic systems [46]. This also means that “saying 
a widget will fail in 100 hours is not sufficient. Saying 
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that a widget will fail in 95 to 105 hours with 94 
percent confidence is much more useful.” [47]. 

4) Prognostics for fleet management 

Life extension - in service beyond expected service life 
(FM. 1) - One of the consequences of operating a fleet 
beyond expected service life is an increasing in 
maintenance, both scheduled (more frequent) and 
unscheduled. The promise of prognostics is that the 
trend analyses can help mitigate unscheduled 
maintenance. More frequent maintenance may also be 
mitigated through careful monitoring if the appropriate 
regulatory authorities concur. 

Decrease unscheduled maintenance (FM.2) - This is 
one of the greatest promises of prognostics, the ability 
to trend and predict the remaining useful life of a 
component prior to failure. The accuracy and 
specificity of such a prediction can enable maintenance 
to be performed as convenient but prior to failure. This 
should reduce the number of unscheduled maintenance 
occurrences. 

Decrease ops costs (RMO) (FM.6) - A large factor in 
aging RMO costs stems from unscheduled 
maintenance. If this unscheduled maintenance can be 
mitigated with prognostics, then the RMO costs can be 
maintained at a more uniform level as the fleet age 
increases. 

Provide surge capacity (FM.8) - The capability of 
being able to predict when a component will fail 
translates to the ability to schedule maintenance at a 
greater convenience. Greater flexibility in scheduling 
maintenance enables being able to provide planned 
surges in capacity. 

Spare part usage analytics and business decisions 
(FM.9) - Usage and repair analytics are a higher level 
analysis function that require a highly integrated 
information system combined with the diagnostic and 
prognostic systems. 

6. DISCUSSION 

As a result of this survey we have found that although 
the metrics associated with diagnostic and prognostic 
algorithm and system performance will positively 
impact the user community, that there are gaps within 
the diagnostic and prognostic metrics. These gaps tend 
to fall within one of the following categories: 

• Process 

• System analysis 

• Data management 

• Verification and validation 

• Human factors 


Process - Large scale adoption of a fully integrated 
health management system requires buy-in from many 
different types of users as well as proper systems 
analysis methods to make the return on investment 
business case. Objectives and requirements should be 
generated from inclusion and ownership of a broad 
spectrum of users. Cost benefit analyses and education 
of users and management about benefits of health 
management help with the adoption. 

It is possible for the best ideas in health management 
system development and operation to be foiled by 
archaic business policies. Cost savings ideas such as 
“Replace only on failure” as pointed out in [19] will 
result in a health management system showing no 
benefits. In other words, one of the biggest obstacles to 
health management system adoption is the 
undocumented human element. Buy-in must be 
obtained at all user levels for the successful adoption. 

System analysis - Another large obstacle is the 
development of sophisticated technologies without a 
view to the greater system. This problem is often 
encountered with engineering development efforts 
devoted to sub-system diagnostics and prognostics. 
This system level perspective is often not considered by 
researchers in diagnostics and prognostics. 

Each subsystem within diagnostics or prognostics can 
be engineered to successfully meet appropriate metrics 
but fail when verification and validation of the broader 
system are considered. This means that a broader 
perspective on verification and validation of total 
system health management is needed where the whole 
system requirement is greater than the sum of the user 
requirements. 

Data management - In addition to the system level 
perspective, there are integration issues, especially 
within the context of a broader information system, that 
will not be addressed directly by sub-system 
requirements or user requirements and yet will vastly 
impact perceptions of utility. Issues such as business 
analytics and decision systems are typically not directly 
considered with diagnostics and prognostic and yet are 
direct consumers of the information that is sourced 
from such systems. 

Verification and validation - As OEMs start to 
outsource more subsystem developments, the total 
system validation and verification becomes a greater 
challenge. In particular, the V & V of complex 
interacting software systems would benefit from a 
model based verification approach as adopted by 
hardware developers. Ofstun [14] also discusses that 
proper verification and validation of 1VHM 
functionality cannot simply be verified in a laboratory, 
that incremental demonstrations need to be conducted 
and that after delivery anomalies will occur and the 
IVHM system needs to be easy to update. 

Human factors - The human element is hard to quantify 
and easier to ignore than other performance metrics. In 
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particular, issues such as alarm dissonance and 
conflicts derive from system wide activities not directly 
measured by any particular subsystem metric are hard 
to manage and mitigate. 

7. SUMMARY 

We have briefly surveyed the recent literature 
pertaining to user goals for aeronautic health 
management systems. We have compared these goals 
with the results from surveys of the objectives and 
metrics of diagnostic and prognostic method 
developments. Although many of the mappings have 
been concerned with diagnosis accuracy and isolation 
as well as the horizon and accuracy of prognosis 
prediction, some of the most interesting information is 
in the gap between user goals and the success metrics 
associated with diagnostics and prognostics. 

NASA’s Aviation Safety Program is investing in 
IVHM. NASA’s IVHM project seeks to develop 
(http://www.aeronautics.nasa.gov/programs_avsafe.htm 
) validated tools, technologies, and techniques for 
automated detection, diagnosis and prognosis that 
enable mitigation of adverse events during flight. The 
project includes a systems analysis aspect that assesses 
i) future directions and technology trends in research 
related to detection, diagnosis, prognosis, and 
mitigation as they pertain to the stated goals of the 
IVHM project, and ii) requirements for future aircraft 
and the issues arising from current and near-term 
aviation technology. We note that while the primary 
focus of the IVHM project is on-board, the health 
management objectives discussed in this paper impact 
the entire aircraft life-cycle. 

Other studies have developed lists of lessons learned 
with respect to aeronautic health management systems 
[14]. In particular [48] points out that holistic approach 
which views the system as a whole rather than as a 
collection of parts is essential. This is also true 
regarding generating user requirements and garnering 
broad organizational support. 

It is our hope that this survey of user objectives as well 
as the mapping from user objectives to diagnostic and 
prognostic metrics can further the widespread adoption 
of health management technologies within aeronautics. 
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